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SPECIFICATION 
MARKER GENE FOR ARTHRORHEUMATISM TEST 

Technical Field 

The present invention relates to rheumatoid arthritis 
susceptibility genes identified de novo by a gene mapping 
method using microsatellite polymorphic markers, and to use 
thereof. 

Background Art 

Arthrorheumatism (Rheumatoid arthritis: RA) is a chronic 
inflammatory disease characterized by autoimmunity. RA, 
which exhibits progressive inflammation with meningeal cell 
overprolif eration in joints, is pathologically classified 
into joint tissue diseases. The morbidity of RA with respect 
to population is high and reaches approximately 1% of various 
races. The familial aggregation and monozygotic twin 
concordance rates of RA have previously been reported to be 
relatively high, suggesting the presence of an inheriting 
factor in its pathogenesis. Indeed, it is known that in the 
family of a proband with RA, a closer relative of the proband 
has higher risk of recurrence . According to previous reports, 
the ratio of risk of the disease in the siblings (As) of the 
proband falls within 2 to 10. 
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Among RA susceptibility genes previously found, the 
HLA-DRBl locus in the HLA class III region on 6p21.3 has been 
thought to most strongly contribute to RA and estimated to 
account for 30 to 50% of total genetic risk. On the contrary, 
this also suggests the presence of other genes undiscovered 
having genetic contribution as strong as HLA-DRBl. Some of 
such other genes have been considered to reside in the HLA 
region and have linkage with HLA-DRBl . Many researchers have 
continuously conducted studies to identify those other genes 
by various approaches including genomewide linkage analysis 
such as sib-pair analysis (Non-Patent Documents 1 to 3) and 
genetic association analysis such as case-control analysis 
(case-control study) on candidate genes or chromosome regions 
(Non-Patent Documents 4 to 6) . However, these studies fell 
short of the identification of all RA susceptibility genes 
and the full explanation of mechanisms of its onset. 

An approach that examines the association between bases 
exhibiting single nucleotide polymorphisms (SNPs) in the human 
genomic DNA sequence and disease has received attention as 
a method for identifying novel disease-related genes or the 
like. However, SNPs are derived from one-nucleotide 
substitution on the genome and therefore result in only two 
alleles in general. In this approach, since only some SNPs, 
which are present within approximately 5 kb from a 
disease-related gene to be mapped, exhibit association, genome 
mapping with SNPs as polymorphic markers requires assigning 
an enormous number of SNPs as markers for analysis. Under 
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the present circumstances, this approach is therefore applied 
only to a limited region narrowed down to some extent. On 
the other hand, a microsatellite polymorphic marker has many 
alleles and is characterized in that it exhibits association 
even at some position distant from a gene to be mapped . However, 
the microsatellite polymorphic marker presented problems in 
that too many polymorphic markers assigned make analysis 
difficult in light of time and labors, as with SNPs, while 
too few polymorphic markers assigned make marker spacings too 
large and might overlook a disease-related gene. 

The present inventors have developed a gene mapping method 
using microsatellite polymorphic markers assigned at 
approximately 50-kb to 150-kb intervals on average and have 
found that a region where a disease-related gene or gene 
relating to human phenotypes with genetic factors is present 
can be identified- at high efficiency and low cost by using 
the method (Patent Document 1) . 

Non-Patent Document 1: Conelis, F. et al., Proc. Natl. 
Acad. Sci. USA, 95, 10746 (1998) 

Non-Patent Document 2 : Shiozawa, S . et al . , Int . Immunol . , 
10, 1891 (1998) 

Non-Patent Document 3: Jawaheer, D. et al.. Am. J. Him. 
Genet., 68, 927 (2001) 

Non-Patent Document 4: Okamoto, K., et al., Am. J. Hum. 
Genet., 72, 303 (2003) 

Non-Patent Document 5: Suzuki, A. et al., Nat. Genet., 
34, 395 (2003) 
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Non-Patent Document 6: Tokuhiro, S. et al., Nat. Genet., 
35, 341 (2003) 

Patent Document 1: International Publication of 
WOOl/79482 

Disclosure of the Invention 

Accordingly, an object of the present invention is to 
identify novel RA susceptibility genes by applying a precise 
mapping method with microsatellite markers capable of 
completely identifying disease susceptibility genes at higher 
cost efficiency than that of conventional approaches of SNP 
association analysis to multifactorial disorder RA for the 
first time. A further object of the present invention is to 
eventually develop the effective prevention/treatment of RA 
by collecting data on RA pathogenesis or onset mechanisms on 
the basis of the information- of the identified RA 
susceptibility genes or RA-related proteins as expression 
products of the genes and performing proper screening. 

In the present invention, a gene mapping method using 
microsatellite (hereinafter, referred to as "MS") was used 
to identify novel RA susceptibility genes whose associations 
with RA had not been known -so far. 

The RA susceptibility genes identified de novo by the 
present invention are TNXB and N0TCH4 genes (chromosome 6) 
as well as RAB6A, MPRL48, FLJ11848, UCP2, and UCP3 genes 
(chromosome 11) in the human genomic DNA sequence . The present 
inventors conducted the association analysis of RA with SNPs 
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present in the genomic DNA sequences of these de 
novo-identif ied genes, and found statistically significant 
association for the first time. 

Thus, in the first aspect, the present invention provides 
a marker gene for arthrorheumatism test consisting of a 
consecutive partial DNA sequence comprising at least one base 
exhibiting single nucleotide polymorphism present in a TNXB, 
N0TCH4, RAB6A, MPRL48, UCP2 or UCP3 gene in the human genomic 
DNA sequence, or of a complementary strand of the partial DNA 
sequence . 

In the second aspect, the present invention provides a 
test method and test kit for RA using the marker gene, 

Brief Description of the Drawings 

Figure 1 is a diagram showing the positions where MS 
markers used in Example of the present application are mapped 
on chromosomes; 

Figure 2 is a diagram showing the mapping and P-values 
of MS markers used in a first-phase screening. The P-values 
of 133 MS markers exhibiting significance are indicated by 
circles (o) ; 

Figure 3 is a diagram showing the positions where MS and 
SNP markers selected in Example of the present application 
are mapped on chromosomes, blocks predicted by EM and Clark 
algorithms, and P-values for allele frequency; 
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Figure 4 is a diagram showing the distribution of tissue 
expression of RA susceptibility genes identified by the present 
invention, and so on; 

Figure 5-1 is a list showing information on the 
designations (left in each column) and Genbank registration 
numbers (right in each column) of microsatellite markers and 
primers used in the present invention; 

Figure 5-2 is a list showing information on the 
designations (left in each column) and Genbank registration 
numbers (right in each column) of microsatellite markers and 
primers used in the present invention; 

Figure 5-3 is a list showing information on the 
designations (left in each column) and Genbank registration 
numbers (right in each column) of microsatellite markers and 
primers used in the present invention; 

Figure 5-4 is a list showing information on the 
designations (left in each column) and Genbank registration 
numbers (right in each column) of microsatellite markers and 
primers used in the present invention; 

Figure 5-5 is a list showing information on the 
designations (left in each column) and Genbank registration 
numbers (right in each column) of microsatellite markers and 
primers used in the present invention; 

Figure 5-6 is a list showing information on the 
designations (left in each column) and Genbank registration 
numbers (right in each column) of microsatellite markers and 
primers used in the present invention; 
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Figure 5-7 is a list showing information on the 
designations (left in each column) and Genbank registration 
numbers (right in each column) of microsatellite markers and 
primers used in the present invention; 

Figure 5-8 is a list showing information on the 
designations (left in each column) and Genbank registration 
niimbers (right in each column) of microsatellite markers and 
primers used in the present invention; 

Figure 5-9 is a list showing information on the 
designations (left in each column) and Genbank registration 
numbers (right in each column) of microsatellite markers and 
primers used in the present invention; 

Figure 5-10 is a list showing information on the 
designations (left in each column) and Genbank registration 
numbers (right in each column) of microsatellite markers and 
primers used in the present invention; 

Figure 5-11 is a list showing information on the 
designations (left in each column) and Genbank registration 
numbers (right in each column) of microsatellite markers and 
primers used in the present invention; 

Figure 5-12 is a list showing information on the 
designations (left in each column) and Genbank registration 
numbers (right in each column) of microsatellite markers and 
primers used in the present invention; 

Figure 5-13 is a list showing information on the 
designations (left in each column) and Genbank registration 
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numbers (right in each column) of microsatellite markers and 

primers used in the present inventions- 
Figure 5-14 is a list showing information on the 

designations (left in each column) and Genbank registration 

numbers (right in each column) of microsatellite markers and 

primers used in the present invention; 

Figure 5-15 is a list showing information on the 

designations (left in each column) and Genbank registration 

numbers (right in each column) of microsatellite markers and 

primers used in the present invention; 

Figure 5-16 is a list showing information on the 

designations (left in each column) and Genbank registration 

numbers (right in each column) of microsatellite markers and 

primers used in the present invention; 

Figure 5-17 is a list showing information on the 

designations (left in each column) and Genbank registration 

numbers (right in each column) of microsatellite markers and 

primers used in the present invention; 

Figure 5-18 is a list showing information on the 

designations (left in each column) and Genbank registration 

numbers (right in each column) of microsatellite markers and 

primers used in the present invention; 

Figure 5-19 is a list showing information on the 

designations (left in each column) and Genbank registration 

numbers (right in each column) of microsatellite markers and 

primers used in the present invention; 
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Figure 5-20 is a list showing information on the 
designations (left in each coluinn) and Genbank registration 
numbers (right in each column) of microsatellite markers and 
primers used in the present inventions- 
Figure 5-21 is a list showing information on the 
designations (left in each column) and Genbank registration 
numbers (right in each column) of microsatellite markers and 
primers used in the present inventions- 
Figure 5-22 is a list showing information on the 
designations (left in each column) and Genbank registration 
numbers (right in each column) of microsatellite markers and 
primers used in the present inventions- 
Figure 5-23 is a list showing information on the 
designations (left in each column) and Genbank registration 
numbers (right in each column) of microsatellite markers and 
primers used in the present inventions- 
Figure 5-24 is a list showing information on the 
designations (left in each column) and Genbank registration 
numbers (right in each column) of microsatellite markers and 
primers used in the present invention; 

Figure 5-25 is a list showing information on the 
designations (left in each column) and Genbank registration 
numbers (right in each column) of microsatellite markers and 
primers used in the present invention; 

Figure 5-26 is a list showing information on the 
designations (left in each column) and Genbank registration 



numbers (right in each column) of microsatellite markers and 
primers used in the present invention; 

Figure 5-27 is a list showing information on the 
designations (left in each column) and Genbank registration 
numbers (right in each column) of microsatellite markers and 
primers used in the present invention; 

Figure 5-28 is a list showing information on the 
designations (left. in each column) and Genbank registration 
numbers (right in each column) of microsatellite markers and 
primers used in the present invention; 

Figure 5-29 is a list showing information on the 
designations (left in each column) and Genbank registration 
numbers (right in each column) of microsatellite markers and 
primers used in the present invention; 

Figure 5-30 is a list showing information on the 
designations (left in each column) and Genbank registration 
numbers (right in each column) of microsatellite markers and 
primers used in the present invention; 

Figure 5-31 is a list showing information on the 
designations (left in each column) and Genbank registration 
numbers (right in each column) of microsatellite markers and 
primers used in the present invention; 

Figure 5-32 is a list showing information on the 
designations (left in each column) and Genbank registration 
numbers (right in each column) of microsatellite markers and 
primers used in the present invention; 
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Figure 5-33 is a list showing information on the 
designations (left in each column) and Genbank registration 
numbers (right in each column) of microsatellite markers and 
primers used in the present invention; 

Figure 5-34 is a list showing information on the 
designations (left in each column) and Genbank registration 
numbers (right in each column) of microsatellite markers and 
primers used in the present invention; 

Figure 5-35 is a list showing information on the 
designations (left in each column) and Genbank registration 
numbers (right in each column) of microsatellite markers and 
primers used in the present invention; 

Figure 5-36 is a list showing information on the 
designations (left in each column) and Genbank registration 
numbers (right in each column) of microsatellite markers and 
primers used in the present invention; 

Figure 5-37 is a list showing information on the 
designations (left in each column) and Genbank registration 
numbers (right in each column) of microsatellite markers and 
primers used in the present invention; 

Figure 5-38 is a list showing information on the 
designations (left in each column) and Genbank registration 
numbers (right in each column) of microsatellite markers and 
primers used in the present invention; 

Figure 5-39 is a list showing information on the 
designations (left in each column) and Genbank registration 



numbers (right in each column) of microsatellite markers and 

primers used in the present inventions- 
Figure 5-40 is a list showing information on the 

designations (left in each column) and Genbank registration 

numbers (right in each column) of microsatellite markers and 

primers used in the present invention; 

Figure 5-41 is a list showing information on the 

designations (left in each column) and Genbank registration 

numbers (right in each colximn) of microsatellite markers and 

primers used in the present invention; and 

Figure 5-42 is a list showing information on the 

designations (left in each column) and Genbank registration 

numbers (right in each column) of microsatellite markers and 

primers used in the present invention. 

Best Mode for Carrying Out the Invention 

A gene mapping method used in the present invention is 
a method described in the Patent Document 1. Specifically, 
this method comprises: using forward and reverse primers 
corresponding to each DNA sequence of consecutive DNA sequences 
comprising MS polymorphic markers assigned at given intervals, 
preferably approximately 100-kb intervals , on the human genome 
to amplify the DNA sequence samples by polymerase chain 
reaction PGR; performing electrophoresis on a high resolution 
gel such as a DNA sequencer; and measuring and analyzing the 
microsatellite polymorphic marker-containing DNA sequence 
fragments, which are amplification products. 



-IS- 
MS polymorphic, markers exhibiting false positive can be 
decreased drastically without forced correction by adopting 
multi-phased screening . that involves performing a first 
(first-phase) screening using forward and reverse primers 
corresponding to MS polymorphic markers assigned genomewide 
and performing a second (second-phase) screening on MS 
polymorphic markers exhibiting positive in the first screening 
by use of a different sample population. 

The position of a target gene is restricted by the 
multi-phased screening using MS. Then, candidate regions or 
gene loci can further be determined in detail by another gene 
mapping method. For example, analysis using SNP is effective 
for this purpose . Specifically, the polymorphism frequencies 
of SNPs in the candidate regions that appear to have the target 
gene are compared, for example by association analysis, between 
populations of patients and normal individuals, and SNPmarkers 
with linkage disequilibrium detected by haplotype analysis 
can be detected by linkage disequilibrium analysis. 

To identify RA susceptibility genes, the present 
invention adopted a previously reported pooled DNA method as 
a screening method with good cost efficiency using 27,158 MS 
markers including 20,755 newly established loci. The genome 
association analysis was conducted by a three -phased screening 
method involving three major steps as described above: (1) 
three-phased genomic screening for reducing a type I error 
rate; (2) the confirmation of association of pools by 
individual genotyping on positive MS loci; and (3) 
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identification by detailed individual genotyping on SNP 
markers in the neighborhoods of candidate regions in screened 
and additional populations. 

The association analysis of the whole genome demonstrated 
the strongest association of the HLA-DRBl gene, which has 
previouslybeen known to have association withRA {P=9 . 7x10"^°) . 
Furthermore, strong association was observed, independently 
of HLA-DRBl, inN0TCH4 {P=l.lxlO"^M and TNXB (P=7 . 6x10"'') genes 
on chromosome 6 also carrying HLA-DRBl. Moreover, novel 
association was found in a mitochondrial-related gene cluster 
on llql3.4 containing mitochondrial ribosomal protein L48 
(MRPL48) and two mitochondrial proteins called uncoupling 
proteins (UCP2 and UCP3) . Weak association was seen on 10pl3 
and 14q23.1. In addition to these novel associations, 
association was confirmed in IkBL (Non-Patent Document 4) and 
PADI4 (Non-Patent Document 5) genes, which have already been 
reported to have association with RA, as with HLA-DRBl. 

Namely, statistically significant difference in allele 
frequencies of SNPs present in TNXB, N0TCH4, RAB6A, MPRL48, 
UCP2, and UCP3 genes found de novo to have association was 
observed between RA patients and normal individuals. Thus, 
a consecutive partial DNA sequence comprising at least one 
base exhibiting signal nucleotide polymorphism present in any 
of these gene regions or a complementary strand of the partial 
DNA sequence can be utilized as a marker gene for 
arthrorheumatism. test. 
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Specifically, it is preferred that the base exhibiting 
single nucleotide polymorphism should be selected from the 
group consisting of: 

the 61st base in SEQ ID NO: 1 or a corresponding base 
on a complementary strand thereof; 

the 61st base in SEQ ID NO: 2 or a corresponding base 
on a complementary strand thereof; 

the 61st base in SEQ ID NO: 3 or a corresponding base 
on a complementary strand thereof; 

the 61st base in SEQ ID NO: 4 or a corresponding base 
on a complementary strand thereof; 

the 401st base in SEQ ID NO: 5 or a corresponding base 
on a complementary strand thereof; 

the 495th base in SEQ ID NO: 6 or a corresponding base 
on a complementary strand thereof; 

the 61st base in SEQ ID NO: 7 or a corresponding base 
on a complementary strand thereof; 

the 61st base in SEQ ID NO: 8 or a corresponding base 
on a complementary strand thereof; 

the 61st base in SEQ ID NO: 9 or a corresponding base 
on a complementary strand thereof; 

the 61st base in SEQ ID NO: 10 or a corresponding base 
on a complementary strand thereof; 

the 401st base in SEQ ID NO: 11 or a corresponding base 
on a complementary strand thereof; 

the 401st base in SEQ ID NO: 12 or a corresponding base 
on a complementary strand thereof; 
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the 401st base in SEQ ID NO: 13 or a corresponding base 
on a complementary strand thereof; 

the 503rd base in SEQ ID NO: 14 or a corresponding base 
on a complementary strand thereof; 

the 201st base in SEQ ID NO: 15 or a corresponding base 
on a complementary strand thereof; 

the 511th base in SEQ ID NO: 16 or a corresponding base 
oh a complementary strand thereof;- 

the 201st base in SEQ ID NO: 17 or a corresponding base 
on a complementary strand thereof; 

the 51st base in SEQ ID NO: 18 or a corresponding base 
on a complementary strand thereof; 

the 61st base in SEQ ID NO: 19 or a corresponding base 
on a complementary strand thereof; 

the 497th base in SEQ ID NO: 20 or a corresponding base 
on a complementary strand thereof; 

the 201st base in SEQ ID NO: 21 or a corresponding base 
on a complementary strand thereof; and 

the 201st base in SEQ ID NO: 22 or a corresponding base 
on a complementary strand thereof. 

SEQ ID NOs: 1 to 5 represent partial sequences of the 
TNXB gene, SEQ ID NOs: 6 to 13 represent partial sequences 
of the N0TCH4 gene, SEQ ID NO: 14 represents a partial sequence 
of the RAB6A gene, SEQ ID NOs: 15 to 18 represent partial 
sequences of the MPRL48 gene, SEQ ID NOs: 19 and 20 represent 
partial sequences of the FLJ11848 gene, SEQIDNO: 21 represents 



a partial sequence of the UCP2 gene,, and SEQ IDNO: 22 represents 
a partial sequence of UCP3. 

These marker genes can be used in genetic testing on RA. 

For example, the consecutive DNA sequence comprising the 
base exhibiting single nucleotide polymorphism is amplified, 
for example by PGR, using forward and reverse primers 
positioned to keep the base exhibiting single nucleotide 
polymorphism in between them. Nucleotide sequences of the 
obtained DNA fragments can be determined and compared with 
a determined corresponding nucleotide sequence from a normal 
individual to thereby test the presence or absence of a genetic 
factor for RA, 

The forward primer used in the test is a primer having 
the same nucleotide sequence as a sequence extending in the 
3 '-end direction from the 5' end of the DNA sequence of the 
marker gene containing the base exhibiting single nucleotide 
polymorphism, which has been mapped on the human genome, and 
includes those of 15 to 100 bases, preferably 15 to 25 bases, 
more preferably 18 to 22 bases, in length. The reverse primer 
is a primer having a nucleotide sequence complementary to a 
sequence extending in the 5 '-end direction from the 3' end 
of the DNA sequence of the marker gene, and those of 15 to 
100. bases, preferably 15 to 25 bases, more preferably 18 to 
22 bases, in length can be used as the reverse primer. 

Examples of primers for amplifying the marker genes having 
the DNA sequences of SEQ ID NOs : 1 to 22 include those having 
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DNA sequences represented by SEQ ID NOs: 23 to 66. The 
relationship of their correspondence is as follows: 

Marker gene .Forward primer Reverse primer 
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Alternatively, the presence or absence of an inheriting 
factor for RA can also be examined by using the marker genes 
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of the present invention as probes to screen a DNA sample from 
a test subject, determining a nucleotide sequence of the 
obtained DNA of the test subject, and then comparing the 
sequence with a sequence from a normal individual. 

In this context, the probe used may be the marker gene 
of the present invention itself or may be the consecutive DNA 
sequence comprising the base exhibiting single nucleotide 
polymorphismpresent in the marker gene, a complementary strand 
thereof, or sequences hybridized by them. Preferably, a probe 
of 15 to 100 bases, preferably 15 to 25 bases, more preferably 
18 to 22 bases, in length can be used. 

On the other hand, coding regions encoded by the RA 
susceptibility genes can be determined by determining the 
full-length nucleotide sequences of the RA susceptibility 
genes on the basis of TNXB, N0TCH4, RAB6A, MPRL4 8, UCP2 and 
UCP3 genes found de novo to have association. As a result, 
amino acid sequences of proteins encoded by the genes can be 
identified. Since proteins with these amino acid sequences 
are highly likely to participate in RA pathogenesis or onset 
mechanisms, RA can be prevented or treated by promoting or 
inhibiting the functions of these proteins. 

Thus, the present invention also relates to a screening 
method using the proteins . Substances promoting or isolating 
the functions of the proteins, that is, agonists or antagonists 
can be identified by the screening method- The antagonist 
used herein encompasses not ohly chemical small molecules but 
also biologically relevant substances such as antibodies. 
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antibody fragments, and antisense oligonucleotides. These 
agonists or antagonists are effective as diagnostic, 
preventive, and/or therapeutic drugs for RA. 

The protein described above may be produced in a 
transformed cell obtained by preparing a vector comprising 
a DNA sequence containing at least the coding region of any 
of the marker genes identified by the present invention, and 
then transforming the vector into an appropriate host cell. 

Example 

Microsatellite (MS) Detection and PGR Primer Design: 
MS sequences with 2-, 3-, 4-, 5-, or 6-base repeat units 
were detected with Apollo program applicable to Sputnik in 
four versions of the human genome draft sequences from Golden 
Path Oct. 2000 to NCBI build 30. PGR primers for amplifying 
these repeats under single reaction conditions were 
automatically designed with Discover, program applicable to 
Primer Express . To prevent differential amplification, these 
PGR primers were designed to contain no SNP in their sequences 
(Sham et al, 2002) . 

A pattern with a number of peaks exhibiting the 
polymorphisms of MS markers in a pool of Japanese (Barcellos. 
L. F. etal.. Am. J. Hum. Genet., 61, 734 (1997)) was compared 
with that of European pools. As a result, individual 
polymorphic MS markers in the Japanese pool exhibited a 
different pattern from that of two European pools (data not 
shown) . The result of the comparison between the races showed 



that the pattern with a number of peaks in the Japanese pool 
reflects polymorphism in MS length and is not experimental 
error. 

In the present invention, 27, 158 polymorphic MS markers 
were assigned and mapped on the human draft sequence (NCBI 
build 30) (Figure 1). Among these markers, 20,755 markers 
were assigned de novo by the present inventors, while remaining 
6, 403 were known markers such as Genethon and CHLC markers. 
The average heterozygosity and average allele number of 27,039 
markers except for 119 markers mapped on the Y gene were 
0.67+0.16 and 6 . 4±3 . 1, respectively . The average marker 
spacing thereof was 108.1 kb (SD=64.5.kb; max=930.1 kb) (see 
Table 1) . These markers can detect linkage disequilibrium 
up to approximately 50 kb distant from a disease locus at a 
rough estimate. Accordingly, these markers were used to 
conduct case-control association analysis on RA. Those 
27,039 microsatellites and primer sequences used for their 
amplification were deposited in Genbank as registration 
numbers listed in Figure 5. 
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[Table 1] 
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Microsateilite markers were mapped on the NCBl build 30. 



In the present invention, 940 test subjects with RA (case 
population) and the same number of normal test subjects 
(control population) were adopted. By permission of the 
ethical committee of each organization associated with the 
present invention, informed consent was obtained from each 
test subject in the case and control populations used in this 
analysis. RA phenotypes were determined according to 
American Rheumatism Association diagnostic criteria for RA. 
All personal data associated withmedical information and blood 
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samples were carefully discarded in organization which 
collected them. 

Average age at disease onset in the case population was 
47 . 7±13 . 1 years old, with the sex ratio of 1:4 (male : female) . 
The average age and sex ratio of the case and control populations 
were set as equally as possible. The sexes of all samples 
involved were confirmed by amelogenin (enamel protein) 
genotyping (Akane, A., et al.. Forensic Sci. Inf., 49, 81 
(1991)), Preliminary PGR test for checking DNA levels was 
conducted by PGR direct sequencing as previously reported 
(Voorter,C. E. et al. , Tissue Antigens, 49, 471 (1997)), while 
HLA-DRBl genotypes were examined. 

DNA Sample Preparation and Typing: 

DNA was extracted with QIAamp DNA blood kit (QIAGEN) from 
the sample of each test subject in the populations under 
standardized conditions for preventing variations in DNA level - 
Subsequently, to check DNA degradation and RNA contamination, 
0.8% agarose gel electrophoresis was performed. After 
optical density measurement for checking protein 
contamination, the DNA concentration was determined by three 
measurements using PicoGreen fluorescence assay (Molecular 
Probes) as previously described (Gollins, H. E. et al., Hum. 
Genet., 106, 218 (2000)). The standardized pipetting and 
dispensation of the DNA samples were performed with robots 
such as Biomek 2000 and Multimek 96 (Beckman) . 



The pooled DNA template for typing two groups of 
approximately 30,000 MS markers was prepared simultaneously 
with or immediately after the DNA quantification. The pooled 
DNA level was further tested by comparing allelic distribution 
between individuals and pooled typing results using three MS 
markers. After this test, approximately 30,000 PCR reaction 
mixtures containing all the components except for the PCR 
primers were prepared and subsequently dispensed to 96-well 
PCR reaction plates, followed by storage until use. 

After PCR reaction, pooled MS typing and individual 
genotyping were conducted according to standard protocols 
using ABI3700 DNA analyzer (Applied Biosystems) . The pooled 
DNA typing could maintain constant accuracy throughout the 
experiment by using the standardized preparation method. 
Various data such as peak positions and heights were 
automatically read by the PickPeak and MultiPeaks programs 
developed by Applied Biosystems Japan, from the multipeak 
pattern in the chromatograph files, that is, ABI fsa files. 

Three-Phased Genome Screening by Pooled DNA Method: 

A population of 375 individuals with RA (case) and the 
same number of unaffected individuals (control) were equally 
divided into three pairs of case and control populations (125 
individuals each) . Population stratification test was 
conducted using 22 randomly selected microsatellites 
sufficient at least for population stratification according 
to Pritchard' s method (Pritchard, J. K. and Rosenberg, N. A., 



Am. J. Hum. Genet., 65, 220 (1999)). The results showed the 
absence of any significant stratification in either case or 
control populations (Table 2) . The prevention of false 
association by the population stratification test is very 
important for late-onset diseases such as RA (Risch 2000) where 
the collection of internal controls is difficult. 
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[Table 2] 
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After the population stratification test, three pooled 
DNA templates from each case or control population were used 
in three-phased genomic screening. This screening method 
simply means reproduction in three independent sample 
populations and is known to be suitable for excluding many 
false positives due to Type I errors caused by multiple testing 
(Barcellos, L. etal., Am. J. Hum. Genet., 61, 724 (1997)). 
The first (first-phase) screening indicated that 2,847 MS 
markers were statistically significant (P<0.05) by the 

Fisher's exact test for either 2x2 or 2xm contingency tables 
(m=the number of alleles) . Subsequent second (second-phase) 
screening indicated that of these 2, 847 markers, 372 MS markers 
were significant. After further third (third-phase) 
screening, 133 positive MS markers were obtained. These 
results are shown in Table 3. 



[Table 3] 
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The number of the positive MS markers was larger than 
statistically expected, suggesting that experimental errors 
caused by the pooled DNA method were contained, therein, as 
previously reported (Shaw, S. H. et al.. Genome Res., 8, 111 
(1998) ; and Sham, P. et al . , Nat. Rev. Genet., 3, 8 62 (2002) ) . 
Thus, we carefully verified these positive markers by 
individual genotyping in the screened populations. As a 
result, 47 markers were significant. Of these markers, 25 
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were excluded due to their low positive allele frequencies 
{<0.05) , resulting in a list of 23 positive MS markers (Table 
4) . 

[Table 4] 
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Specific data serving as a basis for Table 4 are shown 
in Table 5. As an example, this table classifies the region 
determined by each MS marker as positive ( + ) (which was judged 
as having significant disease association in the rheumatoid 
arthritis group (P) as compared with the normal individual 
group (C) ) or as negative (-) . For example, "+/+" means that 
both alleles are positive, and " + /-" means that one of alleles 
is positive, according to the classification. The use of this 
table allows for, for example, the digitization of the 
possibility of rheumatoid arthritis onset by grading each test 
subject according to specific algorithm on the basis of these 
numeric values. In the table, "o" denotes mistyping. 
[Table 5] 
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The seven most significant markers in the list of Table 
4 were also significant after Bonferroni's correction 



(Pc<0.05). Therefore, in this Example, SNP genotyping was 
focused on these candidate regions. 

SNP Genotyping: 

Among the seven most significant markers, four (i.e., 
the first, second, third, and fifth) were located in the HLA 
region on 6p21.3 (Figure 3), whereas the fourth, sixth, and 
seventh significant markers were located on llql3.4, 10pl3, 
andl4q23.1, respectively (cytobands are designated under the 
NCBI build 30) . 

SNPs in the neighborhoods of these candidate regions were 
selected from dbSNP database of NCBI homepage and JSNP database 
of the homepage of The Institute of Medical Science, The 
University of Tokyo. These SNPs were genotyped using TaqMan 
assay or direct sequencing. The TaqMan assay was conducted 
using the standard protocol of ABI PRISM 7900HT Sequence 
Detection System (Applied Biosystems) equipped with 384-Well 
Block Module and Automation Accessory . The direct sequencing 
of the PGR products was conducted according to a standard 
approach using ABI3700 DNA analyzer (Applied Biosystems) . In 
the HLA region, additional SNPs were selected from IkBL to 
C4B genes in order to verify previously reported RA association 
around the centromeric end of the HLA class III region. See 
Table 6 for the details of the selected SNPs. 

Genotyping was conducted on 165 SNPs in the case and 
control populations used in the MS typing. Of these SNPs, 
41 were neither polymorphic nor STSs (sequence tagged sites) 
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(see Table 6) and were therefore excluded from- subsequent 
analysis . Among the remaining 124 SNPs, 54 were statistically 
significant by case-control association analysis (P<0.05) 
(Table 7) . LD block structures were predicted for these 124 
SNPs by EM algorithm (Figure 2) , and case-control association 
analysis using haplotypes in eachblock was conducted according 
to this algorithm (Table 8) - To reproduce these SNP allelic 
associations, these 54 positive SNPs were genotyped in 
additional populations composed of 565 case individuals and 
565 control individuals. Finally, 45 positive SNPs were 
obtained in the combined (n=2x940) population consisting of 
all the samples used in this experiment . Among these positive 
SNPs, 24 was also significant (Pc<0.05) after Bonferroni's 
correction (Table 7) . 

Hereinafter, the analysis result of each chromosome will 
be described. 

6p21.3 

In the HLA region on 6p21.3, 28 of 71 polymorphic SNPs 
were statistically significant (Pc<0.05) in the first test. 
Preliminary genotyping on HLA-DRBl revealed that the 

HLA-DRB1*0405 allele was most significant (P=l . 3x10"^^) . The 
result was, as expected, consistent with many previous reports 
on Japanese populations (Wakitani, S. etal., Br. J. Rheumatol., 
36, 630 (1997); and Shibue, T. et al, Arthritis Rheum., 43, 
753 (2000)) and demonstrated that the method used in the present 
invention is effective for detecting the association of 
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susceptibility genes with RA. In addition to HLA-DRBl, the 
association of the IkBL gene (MIM*601022) promoter SNP 
rs3219185 was also reproduced (P=5.4' 10'^), albeit with 
relatively low frequency of the minor allele. Moreover, 
strong association was seen around the N0TCH4 (MIM*164951) 
and TNXB (MIM*600985) genes, which were approximately 250 kb 
and 300 kb, respectively, distant from HLA-DRBl. 

The N0CTH4 gene is one of proto-oncogenes with epidermal 
growth factor (EGF) repeats. N0TCH4 encodes a large 
transmembrane receptor predicted to be involved in the signal 
transduction of cell proliferation, cell differentiation, and 
angiogenesis (Yung Yu, C. et al., Immunol. Today, 21, 320 
(2000) ) . In N0TCH4, nine SNPs were statistically significant, 
among which two caused amino acid exchange. Among these nine 
SNPs, rs2071282, the SNP in exon 4, exhibited the strongest 
association (P=3.1xl0"®) and caused Leu203Pro exchange at the 
fourth EGF repeat in the extracellular domain of N0TCH4 . On 
the other hand, rs915894 in exon 3 was moderately significant 
(P=0.044) and caused LySll6Gln exchange at the third EGF 
repeat. 

The TNXB gene encodes one of extracellular matrix proteins 
with 34 fibronectin type Ill-like (FNIII) and 18 EGF repeats 
and participates in at least one of essential functions of 
collagen deposition in connective tissues (Mao, J. R. et al., 
Nat. Genet. 30, 421 (2002)). In TNXB, five SNPs were 
statistically significant, of which four caused amino acid 
exchange . Among these five SNPs, rsl85819 in exon 10 exhibited 
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the strongest association (P=6.8xl0"^) and caused Hisl248Arg 
exchange at the seventh FNIII repeat- Other SNPs, rs2075563 
(Glu3260Lys) in exon 29, rs2269428 (His2363Pro) in exon 21, 
and rs3749960 (Phe2300Tyr) in exon 20, were also significant 
and located in the 26th, 18th, and 17th FNIII repeats, 
respectively. 

These six positive SNPs were finally confirmed in the 
combined (n=2x940) population (Table 6) . Further, haplotype 
analysis demonstrated these results for -IkBL, N0TCH4, andTNXB 
(Table 7), indicating the absence of, in all blocks of each 
gene, common haplotypes with greater risks than that of single 
SNP in each gene. .When multiple logistic regression analysis 
was conducted for the SNPs in IkBL, TNXB, and N0TCH4 with those 
in HLA-DRBl, three genes, DRB1*0405 (0Rs=2 . 29-8 . 84 ) , 
rs3219185 in IkBL (ORs=l . 16-2 . 67 ) , and rsl85819 in TNXB 
(ORs=l . 00-1 . 62) , were significant (P<0.05) in a partially 
recessive model. Two SNPs, DRBl (0Rs=2 . 16-4 . 69) and TNXB 
• (ORs=l . 02-1 . 84) , were significant in a partially dominant 
model. On the other hand, when the analysis was limited to 
the shared epitope (SE) of DRBl, SE (ORs=l . 79-3 . 85) , IkBL 
(ORs=l.ll-2.54) , andrs2071282 inN0TCH4 (ORs=l . 13-7 . 14) were 
significant only in the partially recessive model. These 
results suggested that these loci independently correspond 
to RA in the partially recessive model. 
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llql3.4 

The candidate region on llql3.4 contained nine genes 
including three mitochondrial-related genes MRPL48, UCP2, and 
UCP3. Although MRPL48 was recently found as a gene having 
homology to mammalian mitochondrial ribosomal proteins (MRPs) 

(Zhang, Z. and Gerstein, M., Genomics, 81, 468 (2003)), its 
function is still unknown. UCP2 (MIM*601693) and UCP3 

(MIM*602044) encode transporter proteins on the inner 
mitochondrial membrane and participate in energy consumption . 
UCP2 is also known as a susceptibility gene for obesity and 
diabetes. RAS-associated protein RAB6A (MIM*179513) was 
centromerically found with respect to MRPL48 . Further, three 
novel genes were located in regions FLJ11848, LOC374407, and 
DKFZP586P0123. FLJ11848 .has WD40 repeats and widely 
participates in cell-cell interaction (Smith, T. F. et al.. 
Trends Biochem. Sci., 24, 181 (1999)). LOC374407 has been 
found to have homology to heat shock protein 40 homolog {HSP40 
homolog) and structural similarity to spermatogenesis 
apoptosis-related protein. DKFZP586P0123 has one protein 
kinase C conserved region. 

In these genes, 16 of 25 polymorphic SNPs were 
statistically significant in the first test. Although these 
positive SNPs were scattered over the region tested, most 
significant associations (P=0. 00015) were observed in two SNPs, 
rsl792174 in 5 ' -UTRand rsl792160 in intron 3 of MRPL48 . MRPL48 
alsohad two other positive SNPs, rsl792193 {P=0.003) in intron 
5 and rsl051090 (P=0.007) in3'-UTR. Positive SNPs were also 



observed in all of other genes UCP2, UCP3, RAB38 and FLJ11848. 
However, only one common haplotype in the block b2 containing 
MRPL48 and FLJ118.48 exhibited significant association as 
strong as the single SNP in MRPL48, These positive SNPs in 
MRPL48 were finally confirmed after Bonferroni's correction 
in the combined population (Table 7) . On the other hand, 
rsl527302 in DKFZP586P0123 was significant {P=0. 00078) both 
in the first test and after haplotype analysis. However, the 
SNP allelic association was not confirmed in the combined 
population. These results suggested that other causative 
SNPs are present in the block b2, 

. 10ql3, 14q23.1, and PADI4 

The candidate region on 10pl3 had two genes, DKFZP761F241 
and optineurin (OPTN) . Three SNPs in the DKFZP761F241 gene 
were statistically significant in the first test and however, 
was not confirmed after correction in the combined population . 
No common haplotype existed in regions that remained after 
Bonferroni's correction in each population. 

On the other hand, the candidate region on 14q23.1 
contained only reticulon 1 gene (MIM*600865) , which encodes 
the neuroendocrine-specif ic protein group. Even after 
Bonferroni's correction in the combined samples, rs2182138 
in intron 3 of RTNl was still statistically significant 
(P=0 . 0002) . No common haplotype was observed in both regions 
that remained after correction. 
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Further, in the PADI4 gene that appeared to be a 
susceptibility gene for RA by the candidate gene approach 
(Non-Patent Document 5) , four positive SNPs, padi89 {P=0 . 002) , 
padi90 (P=0.004) , rs874881 (P=0.002) , andrs2240340 (P=0.002) , 
were replicated in the populations of this Example. DlS1144i, 
a CA microsatellite marker in intron 6 of the PADI4 gene, was 
confirmed to be included in the RA marker set and exhibit slight 
significance {P=0.008) but low associated allele frequency 
(P=0.037 in the control population). 
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Expression Analysis 

To study the expression patterns of these genes in various 
tissues including synovial cells, we performed quantitative 
reverse transcription-PCR (QRT-PCR) using RNA from these 
tissues . 

Total RNA was isolated by ISOGEN (Nippon Gene) from 
synovial membranes surgically obtained from eight RA and four 
osteoarthritis (OA) patients. We also isolated total RNA from 
a synovial cell line {SW982) provided by American Type Culture 
Collection (ATCC) . Other RNAs from various tissues are 
commercially available from Clontech, Invitrogen, Origene, 
and Stratagene . We evaluated the quality and quantity of these 
RNAs by use of Agilent 2100 Bioanalyzer (Agilent) and confirmed 
their quantities by RiboGreen RNA fluorescence assay 
(Molecular Probes ) . Complimentary DNAs were synthesized from 
these total RNAs using random hexamers and TaqMan reverse 
transcription reagents kit (Applied Biosystems) . We obtained 
cDNA-specif ic primers and probes by the * Assay-by-Design 
(AbD) ' for the ten genes tested and by the ' Assay-on-Demand 
(AoD) ' for GAPD used as a housekeeping control gene, all of 
which were provided by Applied Biosystems . After preliminary 
experiments, 210 nM probes, 756 nM primers, and 0.48 ng/ml 
cDNA at the final concentration in 50 ml reaction volume were 
used in 96-well reaction plates on ABI PRISM 7900 according 
to the standard approach recommended by Applied Biosystems. 
Each plate was processed three times to calculate the average 
and SD for each sample. Estimated quantity was calculated 
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each time using a standard curve in each well. All quantity 
data normalized to GAPD were tested by the Smirnov's test with 
a 5% significance level . After the reciprocal transformation 
of all the normalized quantity data, the Student's t-test was 
conducted for expression levels between RA and OA synovial 
tissues. 

The consistently high expression of N0TCH4 in the lung 
and of TNBX in the adrenal gland were observed (Figure 3a) . 
Our results also showed that all the genes were expressed in 
the RA synovial cells . TNXB and N0TCH4 had significantly high 
expression levels in the RA synovial cells, whereas RTNl had 
the lowest level. We also compared the expression levels of 
these genes between RA and OA synovial cells. The expression 
levels of the MRPL48 (P=0.049) andDKFZP761F241 (P=0.027) genes 
exhibited relatively significant difference between the RA 
and OA synovial cells by the Student's t-test (Table 9 and 
Figure 3b) . MRPL48 expression in the RA synovial tissue was 
approximately twice that in the OA tissue. Three-quarters 
of the RA tissue donors were homozygous for a positive haplotype 
in the block b2 of the MRPL48 locus. 
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statistical Analysis: 

To calculate P-values, two types of the Fisher's exact 
test were used for the 2x2 contingency tables for each 
individual allele and the 2xm contingency tables for each locus 
In this context, m refers to the number of markers observed 
in a population. To practice the Fisher's exact test for the 
2xm contingency tables, Markov chain/Monte Carlo simulation 
method was adopted. We simply presented "allelic" but not 
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phenotypic association for the 2x2 contingency tables for MS, 
SNP and haplotype. These P-values were corrected by 
Bonf erroni * s correction, wherein the coefficient was the total 
number of the contingency tables tested. These analyses were 
conducted with software package MCFishman, Other basic 
statistical analyses including multiple logistic regression 
analysis and Mantel-Haenszel test were performed using SPSS 
program package and Microsoft Excel (trade name) . We 
predicted LD block structures for these SNPs by using the 
confidence intervals of the D' value as a LD measure (Gabriel, 
S. B. et al.. Science, 296, 2225 (2002); and Dawson, E. et 
al.. Nature, 418, 544(2002)). Moreover, haplotypes in each 
block and their frequencies were estimated by EM and Clark 
algorithms. Finally, to evaluate the reliability of 
haplotypes in each block, the 95% confidence interval was 
calculated from each haplotype frequency given by bootstrap 
resampling of up to 2000 times on the basis of the estimated 
haplotype frequencies, which was implemented in the Right 
program (Mano. S, et al., Ann. Hum. Genet., in press). 

In this Example, strong association was found in TNXB 
and N0CTH4 genes 250 kb distant from HLA-DRBl in the candidate 
region narrowed down by the MS markers . These genes are known 
to be located in LD blocks evidently different from that of 
HLD-DRBl (Cullen, M. et al. , Am. J. Hum. Genet. , 71, 759 (2002) ; 
and Walsh, E. C. et al.. Am. J. Hum. Genet., 73, 580 (2003)). 
In agreement with the multiple logistic regression analysis 
result, the result of Mantel-Haenszel test also showed that 
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positive SNPs in TNXB and N0TCH4 are independent of 
HLA-DRB1*0405 or SE in both partially dominant and partially 
recessive models (data not shown) . Further, the candidate 
region was highly identical to one of additional susceptibility 
regions previously predicted (Jawaheer, D. etal., Am. J. Hum. 
Genet., 71, 585 (2002)) . TNXB is known as a causative gene 
of one type of Ehlers-Danlos syndrome (MIM*600985) 
characterized by dysfunction in connective tissues including 
joints. Its gene products participate in connective tissue 
functions and in structures via the deposition of collagens 
of various types (Mao, J. R. etal . , Nat. Genet. , 30, 421 (2002) ) , 
probably including synovial tissues shown here. Type II 
collagen-induced arthritis inmice is known tomimic rheumatoid 
arthritis (Moore. AR., Methods Mol. Biol., 225, 175 (2003)). 

The present inventors believe that the amino acid 
exchanges of the TNXB gene product serve as functional factors 
for RA via a hypothetical pathway associated with collagen 
metabolism- In recent years, it was reported that the N0TCH4 
gene product might participate in overprolif eration via tumor 
necrosis factor (TNF) of synovial cells and in RA (Ando, K. 
et al.. Oncogene, 22, 7796 (2003)) . However, large parts of 
N0TCH4 function are still unclear . 

On llql3.4, although MRPL48 function is still unknown, 
its expression pattern indicated the association of this gene 
with RA. The candidate region llql3-4 contains other 
interesting genes RAB6A, FLJ11848, UCP2, and UCP3. As. with 
this region, even though further association analysis for 10ql3 
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and 14q23.1 requires using higher-density SNP markers, it is 
interesting that other chromosomes were found by the method 
of the present invention. These results chiefly suggested 
that our marker set and method are highly practicable and 
applicable toother complicated diseases, at least tooligogene 
diseases with major genes such as HLA-DRBl in RA. 

Interestingly, our data suggested that the seven most 
significant MS markers are individually positioned in 
particular LD blocks as a trend (Figure 3) . These markers 
were observed on the "Clarkblocks" rather than the "EMblocks" . 
In many cases, positive MS alleles were obviously associated 
with positive SNP haplotypes in these blocks. 



